-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[Metal] Make fast math mode optional #3205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ed on os version.
|
My build now completes on older MacOS, which is awesome :) However, It seems to me that the fast-math flag check should be moved to build-time, as having to set an envvar in production is cumbersome and error-prone. I've asked Codex and it it suggests these two approaches, first is IMO cleanest via a feature flag: diff --git a/candle-metal-kernels/Cargo.toml b/candle-metal-kernels/Cargo.toml
index 8ce09a32..c616d87d 100644
--- a/candle-metal-kernels/Cargo.toml
+++ b/candle-metal-kernels/Cargo.toml
@@ -9,6 +9,9 @@ keywords = ["blas", "tensor", "machine-learning"]
categories = ["science"]
license = "MIT OR Apache-2.0"
+[features]
+default = ["fast-math"]
+fast-math = []
[dependencies]
half = { version = "2.5.0", features = [
diff --git a/candle-metal-kernels/src/kernel.rs b/candle-metal-kernels/src/kernel.rs
index f941e302..bebdd187 100644
--- a/candle-metal-kernels/src/kernel.rs
+++ b/candle-metal-kernels/src/kernel.rs
@@ -2,7 +2,6 @@ use crate::source::{
AFFINE, BINARY, CAST, CONV, FILL, INDEXING, MLX_GEMM, MLX_SORT, QUANTIZED, RANDOM, REDUCE,
SDPA, SORT, TERNARY, UNARY,
};
-use crate::utils::get_env_bool;
use crate::{
ComputePipeline, ConstantValues, Device, Function, Library, MTLCompileOptions,
MTLMathFloatingPointFunctions, MTLMathMode, MetalKernelError, Source,
@@ -182,7 +181,7 @@ fn get_compile_options() -> Retained<MTLCompileOptions> {
let compile_options = MTLCompileOptions::new();
//unsafe { compile_options.setEnableLogging(true) };
- let fast_math_enabled = get_env_bool("CANDLE_METAL_ENABLE_FAST_MATH", true);
+ let fast_math_enabled = cfg!(feature = "fast-math");
// Ref availability:
// https://developer.apple.com/documentation/metal/mtlcompileoptions/mathmode
if available!(macos = 15, ios = 18) {
diff --git a/candle-metal-kernels/src/utils.rs b/candle-metal-kernels/src/utils.rs
index 1ad647d7..8cf4006a 100644
--- a/candle-metal-kernels/src/utils.rs
+++ b/candle-metal-kernels/src/utils.rs
@@ -1,6 +1,5 @@
use crate::metal::{Buffer, CommandBuffer, ComputeCommandEncoder, ComputePipeline};
use crate::MTLSize;
-use std::ffi::OsStr;
use std::ops::Deref;
use std::sync::{RwLockReadGuard, RwLockWriteGuard};
@@ -236,15 +235,3 @@ impl<'a, T> From<RwLockWriteGuard<'a, T>> for RwLockGuard<'a, T> {
fn from(g: RwLockWriteGuard<'a, T>) -> Self {
RwLockGuard::Write(g)
}
-}
-
-fn is_truthy(s: String) -> bool {
- match s.as_str() {
- "true" | "t" | "yes" | "y" | "1" => true,
- _ => false,
- }
-}
-
-pub(crate) fn get_env_bool<K: AsRef<OsStr>>(key: K, default: bool) -> bool {
- std::env::var(key).map(is_truthy).unwrap_or(default)
-}and second keeps the envvar, but resolves it during build: diff --git a/candle-metal-kernels/src/kernel.rs b/candle-metal-kernels/src/kernel.rs
index f941e302..6c07cf0c 100644
--- a/candle-metal-kernels/src/kernel.rs
+++ b/candle-metal-kernels/src/kernel.rs
@@ -2,7 +2,6 @@ use crate::source::{
AFFINE, BINARY, CAST, CONV, FILL, INDEXING, MLX_GEMM, MLX_SORT, QUANTIZED, RANDOM, REDUCE,
SDPA, SORT, TERNARY, UNARY,
};
-use crate::utils::get_env_bool;
use crate::{
ComputePipeline, ConstantValues, Device, Function, Library, MTLCompileOptions,
MTLMathFloatingPointFunctions, MTLMathMode, MetalKernelError, Source,
@@ -182,7 +181,7 @@ fn get_compile_options() -> Retained<MTLCompileOptions> {
let compile_options = MTLCompileOptions::new();
//unsafe { compile_options.setEnableLogging(true) };
- let fast_math_enabled = get_env_bool("CANDLE_METAL_ENABLE_FAST_MATH", true);
+ let fast_math_enabled = FAST_MATH_ENABLED;
// Ref availability:
// https://developer.apple.com/documentation/metal/mtlcompileoptions/mathmode
if available!(macos = 15, ios = 18) {
@@ -200,3 +199,9 @@ fn get_compile_options() -> Retained<MTLCompileOptions> {
}
compile_options
}
+
+const FAST_MATH_ENABLED: bool = match option_env!("CANDLE_METAL_ENABLE_FAST_MATH") {
+ Some("true") | Some("t") | Some("yes") | Some("y") | Some("1") => true,
+ Some(_) => false,
+ None => true,
+};
diff --git a/candle-metal-kernels/src/utils.rs b/candle-metal-kernels/src/utils.rs
index 1ad647d7..8cf4006a 100644
--- a/candle-metal-kernels/src/utils.rs
+++ b/candle-metal-kernels/src/utils.rs
@@ -1,6 +1,5 @@
use crate::metal::{Buffer, CommandBuffer, ComputeCommandEncoder, ComputePipeline};
use crate::MTLSize;
-use std::ffi::OsStr;
use std::ops::Deref;
use std::sync::{RwLockReadGuard, RwLockWriteGuard};
@@ -236,15 +235,3 @@ impl<'a, T> From<RwLockWriteGuard<'a, T>> for RwLockGuard<'a, T> {
fn from(g: RwLockWriteGuard<'a, T>) -> Self {
RwLockGuard::Write(g)
}
-}
-
-fn is_truthy(s: String) -> bool {
- match s.as_str() {
- "true" | "t" | "yes" | "y" | "1" => true,
- _ => false,
- }
-}
-
-pub(crate) fn get_env_bool<K: AsRef<OsStr>>(key: K, default: bool) -> bool {
- std::env::var(key).map(is_truthy).unwrap_or(default)
-} |
|
Great! Wrt having the flag set at buildtime I disagree. We don't have to supply the env var at all as we have a default. We want to be able to toggle it easily at runtime so that we can verify the correctness of the model implementations on metal. |
|
Additionally, I have a sneaking suspicion that this flag will be removed and that we'll disable fast math in the future. I haven't noticed any meaningful performance improvements from having it enabled vs not. |
|
I am happy as long as it unbreaks the build and is on by default :). In my experience, fast-math makes a huge difference on CPU for fp32 models, but not sure how much it matters for fp16 on GPU, and probably not at all for quantized models. That said, I have never seen any adverse effects of having it on. Wrt this being a build-time setting, I have always preferred to compile my shaders ahead of time, so there is naturally becomes part of the build |
Set math mode via old api if macos version < 15 or ios < 18.
Fixes #3185